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Abstract — We consider the compound capacity of polar codes 
under successive cancellation decoding for a collection of binary- 
input memoryless output-symmetric channels. By deriving a 
sequence of upper and lower bounds, we show that in general the 
compound capacity under successive decoding is strictly smaller 
than the unrestricted compound capacity. 



I. History and Motivation 

Polar codes, recently introduced by Ankan [1], are a family 
of codes that achieve the capacity of a large class of channels 
using low-complexity encoding and decoding algorithms. The 
complexity of these algorithms scales as 0(N log N), where 
TV is the blocklength of the code. Recently, it has been 
shown that, in addition to being capacity-achieving for channel 
coding, polar-like codes are also optimal for lossy source 
coding as well as multi-terminal problems like the Wyner-Ziv 
and the Gelfand-Pinsker problem [2]. 

Polar codes are closely related to Reed-Muller (RM) codes. 
The rows of the generator matrix of a polar code of length N = 
2™ are chosen from the rows of the matrix G®" = [ \ \ 1®", 
where <E> denotes the Kronecker product. The crucial difference 
of polar codes to RM codes is in the choice of the rows. For 
RM codes the rows of largest weight are chosen, whereas for 
polar codes the choice is dependent on the channel. We refer 
the reader to [1] for a detailed discussion on the construction 
of polar codes. The decoding is done using a successive 
cancellation (SC) decoder. This algorithm decodes the bits 
one-by-one in a pre-chosen order. 

Consider a communication scenario where the transmitter 
and the receiver do not know the channel. The only knowledge 
they have is the set of channels to which the channel belongs. 
This is known as the compound channel scenario. Let W 
denote the set of channels. The compound capacity of W 
is defined as the rate at which we can reliably transmit 
irrespective of the particular channel (out of W) that is chosen. 
The compound capacity is given by [3] 

C(W) = max inf I P (W), 
p wew 

where Ip(W) denotes the mutual information between the 
input and the output of W, with the input distribution being P. 
Note that the compound capacity of VV can be strictly smaller 
than the infimum of the individual capacities. This happens 
if the capacity-achieving input distribution for the individual 
channels are different. On the other hand, if the capacity- 
achieving input distribution is the same for all channels in 
W, then the compound capacity is equal to the infimum of 
the individual capacities. This is indeed the case since we 



restrict our attention to the class of binary-input memoryless 
output-symmetric (BMS) channels. 

We are interested in the maximum achievable rate using 
polar codes and SC decoding. We refer to this as the compound 
capacity using polar codes and denote it as Cp i sc(W). More 
precisely, given a collection W of BMS channels we are 
interested in constructing a polar code of rate R which works 
well (under SC decoding) for every channel in this collection. 
This means, given a target block error probability, call it Pb, 
we ask whether there exists a polar code of rate R such that 
its block error probability is at most Pb for any channel in VV. 
In particular, how large can we make R so that a construction 
exists for any Pb > 0? 

We consider the compound capacity with respect to igno- 
rance at the transmitter but we allow the decoder to have 
knowledge of the actual channel. 

II. Basic Polar Code Constructions 

Rather than describing the standard construction of polar 
codes, let us give here an alternative but entirely equivalent 
formulation. For the standard view we refer the reader to [1]. 

Binary polar codes have length N = 2™, where n is an 
integer. Under successive decoding, there is a BMS channel 
associated to each bit Ui given the observation vector 1q 1 
as well as the values of the previous bits Uq~ . This channel 
has a fairly simple description in terms of the underlying BMS 
channel 

Definition 1 (Tree Channels of Height n): Consider the 
following N = 2™ tree channels of height n. Let a\ . . . a n 
be the n-bit binary expansion of i. E.g., we have for n = 3, 
= 000, 1 = 001, 7 = 111. Let a = o x o 2 ...<J n . Note 
that for our purpose it is slightly more convenient to denote 
the least (most) significant bit as a n (o"i). Each tree channel 
consists of n + 1 levels, namely 0, . . . , n. It is a complete 
binary tree. The root is at level n. At level j we have 2™~ J 
nodes. For 1 < j < n, if <Jj = then all nodes on level 
j are check nodes; if <jj = 1 then all nodes on level j are 
variable nodes. All nodes at level correspond to independent 
observations of the output of the channel W, assuming that 
the input is 0. 

An example for W 011 (that is n — 3 and a = 011) is shown 
in Figure [T] 

Let us call a = u\ . . . a n the type of the tree. We have 
a G {0, 1}™. Let W be the channel associated to the tree 
of type a. Then I(W a ) denotes the corresponding capacity. 
Further, by Z(W a ) we mean the corresponding Bhattacharyya 
functional (see [4, Chapter 4]). 
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'We note that in order to arrive at this description we crucially use the 



fact that W is symmetric. This allows us to assume that Uq 
vector. 
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Fig. 1. Tree representation of the channel VF 011 . The 3-bit binary expansion 
of 3 is 0"i<T2O"3 = Oil. 

(i) 

Consider the channels introduced by Arikan in [1]. 

The channel W { S has input Ui and output (Y^ -1 , Ui~ x ). 

a) 

Without proof we note that W N is equivalent to the channel 
W 7 introduced above if we let a be the n-bit binary expansion 
of i. 

Given the description of W 7 in terms of a tree channel, it 
is clear that we can use density evolution [4] to compute the 
channel law of W a . Indeed, assuming that infinite-precision 
density evolution has unit cost, it was shown in [5] that the 
total cost of computing all channel laws is linear in N. 

When using density evolution it is convenient to represent 
the channel in the log-likelihood domain. We refer the reader 
to [4] for a detailed description of density evolution. The BMS 
W is represented as a probability distribution over RU {±oo}. 
The probability distribution is the distribution of the variable 

^g(wp[T)l where F~^(y|0). 

Density evolution starts at the leaf nodes which are the 
channel observations and proceeds up the tree. We have 
two types of convolutions, namely the variable convolution 
(denoted by ©) and the check convolution (denoted by S). 
All the densities corresponding to nodes which are at the same 
level are identical. Each node in the j-th level is connected 
to two nodes in the (j — l)-th level. Hence the convolution 
(depending on <jj) of two identical densities in the (j — l)-th 
level yields the density in the j-th level. If <jj — 0, then we 
use a check convolution (ffl), and if Oj = 1, then we use a 
variable convolution (©). 

Example 2 (Density Evolution): Consider the channel 
shown in Figure Q] By some abuse of notation, let W also 
denote the initial density corresponding to the channel W . 
Recall that a = Oil. Then the density corresponding to W 011 
(the root node) is given by 

((W ffl2 )® 2 )® 2 = (w m2 )® 4 . 



III. Main Results 

Consider two BMS channels P and Q. We are interested 
in constructing a common polar code of rate R (of arbitrarily 
large block length) which allows reliable transmission over 
both channels. 

Trivially, 

C P)SC (P,Q)<mm{I(P),I(Q)}. (1) 



We will see shortly that, properly applied, this simple fact can 
be used to give tight bounds. 

For the lower bound we claim that 

C P , sc(PQ) > C P , sc(BEC(Z(P)),BEC(Z(Q))) 

= l-m&x{Z(P),Z(Q)}. (2) 

To see this claim, we proceed as follows. Consider a particular 
computation tree of height n with observations at its leaf 
nodes from a BMS channel with Battacharyya constant Z. 
What is the largest value that the Bhattacharyya constant of 
the root node can take on? From the extremes of information 
combining framework ([4, Chapter 4]) we can deduce that 
we get the largest value if we take the BMS channel to 
be the BEC(Z). This is true, since at variable nodes the 
Bhattacharyya constant acts multiplicatively for any channel, 
and at check nodes the worst input distribution is known to 
be the one from the family of BEC channels. Further, BEC 
densities stay preserved within the computation graph. 

The above considerations give rise to the following trans- 
mission scheme. We signal on those channels W a which are 
reliable for the BEC(max{Z(P), Z(Q)}). A fortiori these 
channels are also reliable for the actual input distribution. 
In this way we can achieve a reliable transmission at rate 
1 - max{Z(P), Z(Q)}. 

Example 3 (BSC and BEC): Let us apply the above men- 
tioned bounds to Cp jS c(P Q), where P = BEC(0.5) and 
Q = BSC(0.11002). We 

I(P) = I(Q) = 0.5, 
Z(BEC(0.5)) = 0.5, 
Z(BSC(0.11002)) = 2^0.1102(1 - 0.11002) w 0.6258. 

The upper bound (Q]i and the lower bound (O then translate 
to 

Op, sc(P Q)) < min{0.5, 0.5} = 0.5, 
C P , sc(P, Q))>1- max{0.6258, 0.5} = 0.3742. 

Note that the upper bound is trivial, but the lower bound is 
not. 

In some special cases the best achievable rate is easy to 
determine. This happens in particular if the two channels are 
ordered by degradation. 

Example 4 (BSC and BEC Ordered by Degradation): 
Let P = BEC(0.22004) and Q = BSC(0.11002). 
We have 7(P) = 0.770098 and I(Q) = 0.5. Further, 
one can check that the BSC(0. 11002) is degraded 
with respect to the BEC(0. 22004). This implies that 
any sub-channel of type a which is good for the 
BSC(0.11002), is also good for the BEC(0.22004). Hence, 
C*p, sc (BEC(0.22004),BSC(0.11002)) = I(Q) = 0.5. 

More generally, if the channels W are such that there is 
a channel W G W which is degraded with respect to every 
channel in W, then C P , SC (W) = C(W) = I(W). Moreover, 
the sub-channels a that are good for W are good also for all 
channels in W. 

So far we have looked at seemingly trivial upper and lower 
bounds on the compound capacity of two channels. As we 
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will see now, it is quite simple to considerably tighten the 
result by considering individual branches of the computation 
tree separately. 

Theorem 5 (Bounds on Pairwise Compound Rate): Let P 
and Q be two BMS channels. Then for any n € N 

Cp,sc(P,Q)<^ Yl min{I(P CT ),/(Q-)}, 

<t6{0,1}™ 

0>,sc(P,Q)>1-^ E ™*x{Z(P a )>Z(Q°)}- 

<tG{0,1}" 

Further, the upper as well as the lower bounds converge to the 
compound capacity as n tends to infinity and the bounds are 
monotone with respect to n. 

Proof: Consider all N = 2" tree channels. Note that 
there are 2 n_1 such channels that have <j\ = and 2"~ 1 such 
channels that have a u\ = 1. Recall that a\ corresponds to the 
type of node at level n. 

This level transforms the original channel P into P° and 
P 1 , respectively. Consider first the 2™ _1 tree channels that 
correspond to a\ = 1. Instead of thinking of each tree as a 
tree of height n with observations from the channel P, think of 
each of them as a tree of height n— 1 with observations coming 
from the channel P 1 . By applying our previous argument, we 
see that if we let n tend to infinity then the common capacity 
for this half of channels is at most 0.5 min{/(P 1 ), I(Q 1 )}. 
Clearly the same argument can be made for the second half 
of channels. This improves the trivial upper bound ([]]) to 

Cp.sc(PO) <0.5min{/(P 1 ),/(Q 1 )}+ 
0.5min{/(P°),/(Q )}. 

Clearly the same argument can be applied to trees of any 
height n. This explains the upper bound on the compound 
capacity of the form min{7(P <T ), I{Q a )}. 

In the same way we can apply this argument to the lower 
bound (0. 

From the basic polarization phenomenon we know that for 
every 8 > there exists an n € N so that 

±\{*e{0,l} n :I(P*)€[6,l-6\}\<6/4. 

Equivalent statements hold for I(Q a ), Z(P a ), and Z{Q a ). 

In words, except for at most a fraction 6, all channel pairs 
(P CT ,Q CT ) na ve "polarized." For each polarized pair both the 
upper as well as the lower bound are loose by at most 8. 
Therefore, the gap between the upper and lower bound is at 
most (1 - 8)28 + 8. 

To see that the bounds are monotone consider a particular 
type cr of length n. Then we have 

min{/(P' T ),/(Q CT )} 

= min{i(/(P-°) + I(P^)), \{I{Q M ) + HQ- 1 ))} 

> imin{/(P ff0 ),/(Q CT0 )} + imin{/(P ffl ),PvQ ff1 )}. 

A similar argument applies to the lower bound. ■ 
Remark: In general there is no finite n so that either upper 
or lower bound agree exactly with the compound capacity. On 



the positive side, the lower bounds are constructive and give 
an actual strategy to construct polar codes of this rate. 

Example 6 (Compound Rate of BSC(S) and BEC(e)): 
Let us compute upper and lower bounds on 
C* P ,sc(BSC(0.11002),BEC(0.5)). Note that both the 
BSC(0.11002) as well as the BEC(0.5) have capacity 
one-half. Applying the bounds of Theorem [5] we get: 
n=0 1 2 3 4 5 6 

0.500 0482 0482 0482 0482 0482 0.482 

0.374 0.407 0.427 0.440 0.449 0.456 0.461 
These results suggest that the numerical value of 
C* P ,sc(BSC(0.11002),BEC(0.5)) is close to 0.482. 

Example 7 (Bounds on Compound Rate of BMS Channels): 
In the previous example we considered the compound capacity 
of two BMS channels. How does the result change if we 
consider a whole family of BMS channels. E.g., what is 
Cp, sc ({BMS(/ = 0.5)})? 

We currently do not know of a procedure (even numerical) 
to compute this rate. But it is easy to give some upper and 
lower bounds. 

In particular we have 

C* P ,sc({BMS(7 = 0.5)}) < C(BSC(0.11002),BEC(0.5)) 
< 0.4817, 

C* P , SC ({BMS(7 = 0.5)}) > 1 - Z(BSC(J = 0.5)) « 0.374. 

(3) 

The upper bound is trivial. The compound rate of a whole class 
cannot be larger than the compound rate of two of its members. 
For the lower bound note that from Theorem [5] we know that 
the achievable rate is at least as large as 1 — max{Z}, where 
the maximum is over all channels in the class. Since the BSC 
has the largest Bhattacharyya parameter of all channels in the 
class of channels with a fixed capacity, the result follows. 



IV. A Better Universal Lower Bound 

The universal lower bound expressed in (01 is rather weak. 
Let us therefore show how to strengthen it. 

Let W denote a class of BMS channels. From Theorem [5] 
we know that in order to evaluate the lower bound we have 
to optimize the terms Z(P a ) over the class W. 

To be specific, let W be BMS (J), i.e., the space of BMS 
channels that have capacity I. Expressed in an alternative way, 
this is the space of distributions that have entropy equal to 
1 - I. 

The above optimization is in general a difficult problem. 
The first difficulty is that the space |BMS(/)} is infinite 
dimensional. Thus, in order to use numerical procedures 
we have to approximate this space by a finite dimensional 
space. Fortunately, as the space is compact, this task can be 
accomplished. E.g., look at the densities corresponding to the 
class {BMS(/)} in the |P|-domain. In this domain, each BMS 
channel W is represented by the density corresponding to 
the probability distribution of | W(Y | 0) — W(Y 1 1)|, where 
Y ~ W(y | 0). For example, the |P|-density corresponding to 
BSC(e) is Ax_ 2e . 

We quantize the interval [0, 1] using real values = pi < 
P2 < ■ ■ ■ < Pm = 1, to G N. The m-dimensional polytope 
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approximation of {BMS(/)}, denoted by W m , is the space 
of all the densities which are of the form YlTLi a i^Pi- Let 
a = [cki , • • • , a m ] T . Then a must satisfy the following linear 
constraints: 



a T lmxi 



1, a 1 H„ 



1 - I, ou > 0, 



(4) 



where iJ mx i = ht^^nxi and l mxl is the all-one 
vector. 

Due to quantization, there is in general an approximation 
error. 

Lemma 8 (m versus 6): Let a 6 BMS(J). Assume a uni- 
form quantization of the interval [0, 1] with m points = p\ < 
P2 < ■ ■ ■ < p m = 1. If m > 1 + - — A s2 , then there exists a 
density b G W m such that \Z(a ffl a) - Z(b mb)\ < S. 

Proof: For a given density a, let Q u {a)(Qd(o,)) de- 
note the quantized density obtained by mapping the mass 
in the interval (pi,p l+1 ]{[pi,p i+1 )) to p l+1 {p l ). Note that 
Quip) (Qd(a)) is t/pgraded (c/egraded) with respect to a. 
Thus, H(Q u (a)) < H{a) < H(Q d (a)). The Bhattacharyya 
parameter Z(a ffl a)is given by 

Z(aSa)= / / J 1 — x\ x\a{x\)dx\a{x,2)dxi- 
Jo Jo 

Since \J\ — x 2 is decreasing on [0, 1], we have 

Z(Q d (a) ffl Qd(a)) - Z(a ffl a) 



771—1 



^ E 



a(x)dxa(y)dy, 



Z(a ffl a) - Z(Q u (a) ffl Q u (a)) 
"Pi+i rf>j+i 



< E / /"" (v^W-v/i-^+i) 

a(x)dxa(y)dy. 

Now note that the maximum approximation error, call it <5, 
happens when is close to 1. This maximum error is equal 
to 
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Solving for m we see that the quantization error can be made 
smaller than 6 by choosing m such that 

1 



m > 1 



(5) 



1- yi^P 

Note that if a € W then in general neither Qd(a) nor Qd(a) 
are elements of W m , since their entropies do not match. In 
fact, as discussed above, the entropy of Qd{a) is too high, 
and the entropy of Q u (a) is too low. But by taking a suitable 
convex combination we can find an element b E W m for which 
Z{b m ) differs from Z(a m2 ) by at most 5. 

In more detail, consider the function f(t) = H(tQ u (a) + 
(1 — t)Q d (a)), < t < 1. Clearly, / is a continuous function 
on its domain. Since every density of the form of tQ u (a)+(l— 
t)Qd(a) is upgraded with respect to Q d (a) and degraded with 
respect to Q u (a), we have Z((Q u (a)) m2 ) < Z((tQ u (a)+(l- 



t)Q d (a)) m2 ) < Z((Q d {a))® 2 ). As a result: \Z((tQ u (a) + 
(1 - t)Q d (a)) m2 ) - Z(a m2 )\ < 5. We further have /(0) = 
H(Q u (a)) < H(a) < H(Q d (a)) = f(l). Thus there exists a 
< t < 1 such that f(t ) = H(a) = I. Hence, t Q u (a) + 
(l-t )Qd(a) G BMS(/) and t Q u (a) + (l-t )Q d (a) € W m . 
Therefore toQ u (a) + (1 — to)Q d (a) is the desired density. ■ 
Example 9 (Improved Bound for BMS(I = h)): Let us de- 
rive an improved bound for the class W = BMS(I = |). 
We pick n = 1, i.e., we consider tree channels of height 1 in 
Theorem [5] 

For cr = the implied operation is ©. It is well known 
that in this case the maximum of Z(a © a) over all a G 
W is achieved for a = BSC(0. 11002). The corresponding 
maximum Z value is 0.3916. 

Next consider a = 1. This corresponds to the convolution 
ffl. Motivated by Lemma [8] consider at first the maximization 
of Z within the class W m : 



maximize : ^ a l a j Z(A Pi ffl A Pj ) = ^ a^aj \/l - 
subject to : a T l mx i = 1, a T H mxl 



i -j 
1 



:, > 0. 



(6) 



In the above, since the pts are fixed, the terms — (piPj) 2 
are also fixed. The task is to optimize the quadratic form 
a 1 Pa over the corresponding a poly tope, where the m x m 
matrix P is defined as Pij = yl— (piPj) 2 - We claim that 
this is a convex optimization problem. 

To see this, expand y/l — x 2 as a Taylor series in the form 

v/l-.r 2 = l-^> 



,,2/ 



(7) 



where the ti > 0. We further have 



a T Pa = ^a.ftjJl - {p t Pj) 2 = 1 - E^ (E a ' p 



i>0 



(8) 

Thus, since <; > and the pi& are fixed, each of the 
terms — Q^iPi 2 ') 2 m me above sum represents a concave 
function. As a result the whole function is concave. 

To find a bound, let us relax the condition < on < 1 
and admit «6i We are thus faced with solving the convex 
optimization problem 

maximize : a T Pa 

subject to : a T l mxl = 1, a T H mxl = i. 
The Kuhn-Tucker conditions for this problem yield 



2P 1 H 
1 T 
fl T 



ai 




"0" 


a 2 














Ai 




1 






1 

.2. 



(9) 



As P is non-singular, the answer to the above set of linear 
equations is unique. 

We can now numerically compute this upper bound and 
from Lemma [8] we have an upper bound on the estimation 



error due to quantization. We get an approximate value of 
0.799. We conclude that 

C P , SC ({BMS(7 = 0.5)}) > 1 - ^(0.392 + 0.799) 
= 0.404. 

This slightly improves on the value 0.374 in ([3). In principle 
even better bounds can be derived by considering values of n 
beyond 1. But the implied optimization problems that need to 
be solved are non-trivial. 

V. Conclusion and Open Problems 

We proved that the compound capacity of polar codes under 
SC decoding is in general strictly less than the compound 
capacity itself. It is natural to inquire why polar codes com- 
bined with SC decoding fail to achieve the compound capacity. 
Is this due to the codes themselves or is it a result of the 
sub-optimality of the decoding algorithm? We pose this as an 
interesting open question. 

In [6] polar codes based on general i x £ matrices G were 
considered. It was shown that suitably chosen such codes have 
an improved error exponent. Perhaps this generalization is also 
useful in order to increase the compound capacity of polar 
codes. 
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